Group Task

Marco Boso, Diego Paroli, Yijia Lin, Bradley McKenzie, Linghan Zheng, Jia Lin, & Isabel Monge

Required packages

We used the following packages for our analysis

rm(list = ls())
library(tidyverse)
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
library(DataExplorer)
library(glue)
library(ggplot2)
library(forcats)
library(ggrepel)
library(scales)
library(ggpubr)
library(mapSpain)
library(sf)
library(ggiraph)

Data cleaning highlights

  • High dimensionality was the key to address, e.g. election data  with it’s 471 columns. For this, our friends were filter, pivot_longer, group_by and summarise.

  • Using str_detect, we were able to track most of the branches/federations of the main parties and group them together

  • In the end we created three aggregated tidy data files to work with, based on survey data; election data and election turnout. The files all have standardized names and codes for parties, as well as election dates, which allow us to join them together when needed in our analysis

    • Timing → date
    • Location information → code_community (autonomous community), code_province, code_municipality, municipality, population
    • General election information → polling_stations, participation_1, participation_2, blank_votes, null_votes, valid_votes, total_votes
    • Party votes received → party_main, party_code, party_ballots

Q1 : Dominant parties in municipalities with +100k habitants

There were only 48 municipalities with over 100k people. 2 of the cities, Cádiz and Dos Hermanas had a varying population that was over 100k in some periods and under in others. We created a table with the winners and total % of vote, then plotted over time. PP have a stronghold in some areas.

Q2 : Primary Runner-Up Parties to PSOE and PP in Elections

ANY TEXT YOU WANT

<<<<<<< Updated upstream
=======

Q2 : Primary Runner-Up Parties to PSOE and PP in Elections

The original code was a loop that analyzed the second-party votes across all elections, breaking down the results by the size of the cities (in terms of population) where people voted.

Second most voted party by population when the first is PSOE

Q2 : Primary Runner-Up Parties to PSOE and PP in Elections

# Step 1: Create population categories with ordered factors for PP
pp_first <- pp_first |> 
  mutate(
    population_category = factor(
      case_when(
        population < 10000 ~ "<10.000",
        population >= 10000 & population < 50000 ~ ">= 10.000 & < 50.000",
        population >= 50000 & population < 100000 ~ ">= 50.000 & < 100.000",
        population >= 100000 & population < 500000 ~ ">= 100.000 & < 500.000",
        population >= 500000 & population < 1000000 ~ ">= 500.000 & < 1.000.000",
        population >= 1000000 ~ ">= 1.000.000"
      ),
      levels = c("<10.000", ">= 10.000 & < 50.000", ">= 50.000 & < 100.000", 
                 ">= 100.000 & < 500.000", ">= 500.000 & < 1.000.000", ">= 1.000.000")
    )
  )

# Step 2: Loop through elections and create a plot for each election for PP
unique_dates_pp <- unique(pp_first$date_elec)
plots <- list()

# Create a list to store plots for PP
plots_pp <- list()

for (date in unique_dates_pp) {
  # Ensure `date` is treated as a valid Date object
  current_date <- as.Date(date)
  
  # Filter data for the specific election date
  data_filtered <- pp_first |> 
    filter(date_elec == current_date) |> 
    group_by(population_category, second_party) |> 
    summarise(
      total_votes = sum(second_votes, na.rm = TRUE),
      .groups = "drop"
    )
  
  # Create the plot
  plot <- ggplot(data_filtered, aes(x = population_category, 
                                    y = total_votes, 
                                    fill = second_party)) +
    geom_bar(stat = "identity", position = "dodge", width = 0.7) +
    scale_fill_manual(values = party_colors) +
    scale_y_continuous(labels = scales::comma) + 
    labs(
      title = paste("Second Party by Population for Election on", format(current_date, "%Y-%m-%d")),  # Format the date properly
      x = "Inhabitants per city",
      y = "Total Votes"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
      axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
      axis.text.y = element_text(size = 10),
      legend.position = "bottom",
      legend.title = element_blank()  # Remove legend title
    )
  
  # Save the plot to the list
  plots_pp[[as.character(current_date)]] <- plot
}

# Step 3: Display all plots for PP (one at a time)
for (date in unique_dates_pp) {
  print(plots_pp[[as.character(as.Date(date))]])
}

>>>>>>> Stashed changes

Q3 : Impact of low turnout on election outcomes

Low Turnout Areas (<70.7%) PSOE and PP dominate low-turnout areas, securing the highest vote shares. Smaller parties like VOX, CS, and EH-BILDU have moderate presence, while regional parties and new entrants struggle to gain significant traction.

Q3 : Impact of low turnout on election outcomes

Extreme Low Turnout Areas (<45%) EAJ-PNV leads in extreme low-turnout areas, followed closely by PSOE and EH-BILDU. Traditional parties like PP still maintain influence, while smaller parties show minimal presence.

Q4 : Relationship between census and vote (urban vs. rural)

There is no significant overall correlation between population size (log-transformed) and the percentage of votes for parties. The red line indicates a stable trend with no noticeable changes.

Q4 : Relationship between census and vote (urban vs. rural)

Different parties show varying trends in vote percentages across population scales (log-transformed. PP performing better in lower population, while PSOE and PODEMOS-IU perform better in higher population.

Q4 : Relationship between census and vote (urban vs. rural)

Different parties show significant differences in support between rural and urban areas. EH-BILDU and BNG have higher support in rural areas, while MP, CS and PODEMOS-IU have stronger support in urban areas

Q5 : Polling error

<<<<<<< Updated upstream
=======

Q5 : Polling error vs days before election

Are polls more precise as we get closer to the election?

Q5 : Polling error vs surveys size

Are surveys conducted on a bigger sample more precise?

>>>>>>> Stashed changes

Q6 : Polling house accuracy

Which polling houses got it right the most and which ones deviated the most from the results?


Measurement criteria: weighted mean absolute error (WMAE) Weighting: 0.7 to the top five parties receiving the most votes in each general election, and a weight of 0.3 to the remaining parties.

final_election_summary_v2 <- final_election_summary |> 
  group_by(date_elec) |>  
  arrange(date_elec, desc(national_share)) |>  
  mutate(rank_pos = dense_rank(-national_share)) |>  
  mutate(weight = if_else(rank_pos <= 5, 0.7, 0.3)) |>  
  ungroup()

final_election_summary_v3 <- final_election_summary_v2 |> 
  mutate(national_share=national_share*100) |> 
  mutate(error = abs(votes - national_share)) 
  
wmae <- final_election_summary_v3 |> 
  group_by(pollster) |> 
  summarise(WMAE = sum(weight * error) / sum(weight)) |> 
  ungroup() |> 
  mutate(pollster = fct_reorder(pollster, WMAE, .desc = FALSE))


head(wmae$pollster[order(wmae$WMAE)], 5)
tail(wmae$pollster[order(wmae$WMAE)], 5)
[1] IMOP         SOCIOMÉTRICA APPEND       METRA SEIS   VOX PÚBLICA 
46 Levels: IMOP SOCIOMÉTRICA APPEND METRA SEIS VOX PÚBLICA ... NETQUEST
[1] DYM           SIMPLE LÓGICA METROSCOPIA   MYWORD        NETQUEST     
46 Levels: IMOP SOCIOMÉTRICA APPEND METRA SEIS VOX PÚBLICA ... NETQUEST

Q6 : Visualization

Exploratory questions

Q7 : Survey projection accuracy over time

Trends in survey prediction biases over time, focusing on bias evolution and accuracy for each party.

<<<<<<< Updated upstream
=======
>>>>>>> Stashed changes

Q8 : Turnout rate over time

How has the turnout rate changed over time? And within each election year, how are turnout rates correlated with the municipalities’ populations?

<<<<<<< Updated upstream

We can observe that there is no clear tendency of turnout rate. However, it is interesting seeing that these two ‘snap’ elections in 2016 and 2019 were held shortly after the previous elections, with a very brief gap between each one and its predecessor. And the voter turnout in both of these elections was significantly lower compared to the previous ones, suggesting that the nature of such ‘emergency’ elections may reduce citizens’ willingness to vote again.

=======

We can observe that there is no clear tendency of turnout rate. However, it is interesting seeing that these two ‘snap’ elections in 2016 and 2019 were held shortly after the previous elections, with a very brief gap between each one and its predecessor. And the voter turnout in both of these elections was significantly lower compared to the previous ones, suggesting that the nature of such ‘emergency’ elections may reduce citizens’ willingness to vote again.

>>>>>>> Stashed changes

Q8 : Turnout rate correlated with the municipalities’ populations over time

And apart from the election year and specific election context, how are turnout rates correlated with the municipalities’ populations within each election year?

<<<<<<< Updated upstream
======= >>>>>>> Stashed changes

Q9 : Electoral support for small parties over time

Support for smaller parties increased significantly over time, peaking across most communities in April 2019.

<<<<<<< Updated upstream
=======
>>>>>>> Stashed changes